**The Springer Series on Demographic Methods and Population Analysis 44**

Roland Rau Christina Bohk-Ewald Magdalena M. Muszyńska James W. Vaupel

# Visualizing Mortality Dynamics in the Lexis Diagram

## **The Springer Series on Demographic Methods and Population Analysis**

Volume 44

**Series Editor** Kenneth C. Land, Duke University In recent decades, there has been a rapid development of demographic models and methods and an explosive growth in the range of applications of population analysis. This series seeks to provide a publication outlet both for high-quality textual and expository books on modern techniques of demographic analysis and for works that present exemplary applications of such techniques to various aspects of population analysis.

Topics appropriate for the series include:


Volumes in the series are of interest to researchers, professionals, and students in demography, sociology, economics, statistics, geography and regional science, public health and health care management, epidemiology, biostatistics, actuarial science, business, and related fields.

More information about this series at http://www.springer.com/series/6449

Roland Rau • Christina Bohk-Ewald Magdalena M. Muszynska • James W. Vaupel ´

## Visualizing Mortality Dynamics in the Lexis Diagram

Roland Rau Faculty of Economic and Social Sciences University of Rostock Rostock, Germany

Magdalena M. Muszynska ´ Collegium of Economic Analysis Warsaw School of Economics Warsaw, Poland

Christina Bohk-Ewald Max Planck Institute for Demographic Research Rostock, Germany

James W. Vaupel Max Planck Institute for Demographic Research Rostock, Germany

Department of Public Health University of Southern Denmark Odense C, Denmark

Duke University USA

ISSN 1389-6784 ISSN 2215-1990 (electronic) The Springer Series on Demographic Methods and Population Analysis ISBN 978-3-319-64818-7 ISBN 978-3-319-64820-0 (eBook) DOI 10.1007/978-3-319-64820-0

Library of Congress Control Number: 2017948687

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication. **Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## **Acknowledgments**

The authors would like to thank the friends and colleagues who provided feedback at numerous conferences and workshops. Major demographic conferences, where parts of the material of this monograph were presented, were the IUSSP International Population Conference in Busan in 2013, the Annual Meetings of the Population Association of America in New Orleans in 2013 and in Boston in 2014, and the European Population Conference in Budapest in 2014.

The European Research Council has provided financial support under the European Community's Seventh Framework Programme (FP7/2007–2013)/ERC grant agreement no. 263744.

## **Contents**



## **List of Figures**




## **List of Tables**


## **Chapter 1 Introduction: Why Do We Visualize Data and What Is This Book About?**

*Restate my assumptions: One, mathematics is the language of nature. Two, everything around us can be represented and understood through numbers. Three, if you graph the numbers of any system, patterns emerge.*

Sean Gullette as Maximilian Cohen in the movie -(1998).

The goal of this book is simple: We would like to show how mortality dynamics can be visualized in the so-called Lexis diagram. To appeal to as many potential readers as possible, we do not require any specialist knowledge. This approach may be disappointing: Demographers may have liked more information about the mathematical underpinnings of population dynamics on the Lexis surface as demonstrated, for instance, by Arthur and Vaupel in 1984. Statisticians would have probably preferred more information about the underlying smoothing methods that were used. Epidemiologists likewise might miss discussions about the etiology of diseases. Sociologists would have probably expected that our results were more embedded into theoretical frameworks....

We are aware of those potential shortcomings but believe that the current format can, nevertheless, provide interesting insights into mortality dynamics, and we hope our book can serve as a starting point to visualize data on the Lexis plane for those who have not used those techniques yet.

Visualizing data has become increasingly popular in recent years.<sup>1</sup> But why do we visualize data at all? Countless books on *how* to visualize data — often with a specific software tool in mind — are published every year. Maybe it seems to be too

<sup>1</sup>This trend is probably best demonstrated by visualizing the popularity of the term "visualizing data" over time, for instance, via Google's *Ngram viewer*. Google Books Ngram Viewer displays the relative frequency of a search term in a corpus of books during a given time frame. Please see, for example: https://books.google.com/ngrams/graph?content=visualizing+ data+&year\_start=1960&year\_end=2008

obvious, but only a few of those publications address the question of *why* one should visualize data at all. According to the ones covering the topic, the purpose of data visualization can be narrowed down to three reasons (e.g., Tukey 1977; Schumann and Müller 2000; Few 2014):


Maps and diagrams were already known in ancient Egypt but also communicating scientific results via visualization is at least 400 years old when Galileo Galilei (1613) and others published their observations of sunspots and other celestial bodies (Friendly 2008). But why is data visualization only becoming increasingly popular during the last 15–20 years? We argue that the key reason is the trend towards virtually ubiquitous access to electronic computing resources, enabling more and more people to participate in this endeavor. One could call it even a democratization of computing. In our opinion we can distinguish three key developments that played a crucial role since the 1980s and especially the 1990s. They are not listed in order of importance nor can they be considered in isolation from each other.

Hardware: The introduction of the predecessor of all modern PCs, the IBM personal computer, in 1981 as well as of microcomputers (e.g., the "C-64") in the same era triggered a shift away from the so-called minicomputers of the 1970s2 to

<sup>2</sup>As noted at https://en.wikipedia.org/wiki/Minicomputer#cite\_note-Smith\_1970-4 (last accessed on 13 June 2017), the New York Times wrote in 1970 that minicomputers were computers that cost less than US-\$ 25,000.

computers that could be purchased by households of average income. The speed of the processors was too slow and the size of computer memory was too small to process data as conveniently as we can nowadays, though. The first PC had an upper limit for working memory (RAM) of 256 kB, that is about 0.000778% of the first author's current desktop workstation. If we disregard developments in cache technology, parallel processing, etc., the pure clock speed of processors is now three orders of magnitude higher than in the early 1980s. Only 20 years ago, the typical size of total RAM was about as large as the size of a *single* digital photo today. But even if there was enough RAM and sufficient clock speed of the CPU, data storage was another limiting factor. The first hard disk with a capacity of more than one gigabyte was introduced in 1980 and cost at least US- \$ 97,000.<sup>3</sup> One thousand times the storage capacity is available now at less than US-\$ 100. This trend allowed the collection of massive data sets. To illustrate current capabilities: If we were interested in creating a data set, which contains about 1000 alphabetic characters (more than enough for the name, birth date and current residence) of any person alive, we would have to invest less than US-\$ 400.4 But, once again, even if we had the affordable computer storage of today, communicating results graphically was hindered by the low resolution combined with relatively few colors of early graphics standards such as CGA and EGA. Only with the introduction and the extension of the VGA standard, high resolution displays have become feasible.

	- general programming languages (e.g., Python, Perl) as well as
	- languages tailored or at least particularly suited for statistical programming and data analysis. The invention of the S language, started in the 1970s, was instrumental.<sup>5</sup> The most prominent example today is probably R (Ihaka 1998), but also other languages such as the now almost completely abandoned XLISP-STAT (de Leeuw 2005) facilitate(d) the visualization of data.6
	- Lastly, in the area of efficient data storage, especially with the advent of "big data". Although it might be one of the most abused buzzwords currently, data

<sup>3</sup>See: https://www-03.ibm.com/ibm/history/exhibits/storage/storage\_3380.html, last accessed on 13 June 2017.

<sup>4</sup>Assuming a world population of less than eight billion, a price for a 2TB hard disk of less than US-\$S 100 and one byte per alphabetic letter.

<sup>5</sup>Please see Appendix A in Chambers (2008) for some notes on the history of S.

<sup>6</sup>It should be mentioned, though, that Matlab (Mathworks 2017), which is not published under a free/open-source license, was and is also key for the analysis and visualization of data.

sets in the gigabyte and terabyte range, partly in non-rectangular formats, have become ubiquitous. Those data can be handled by relational and nonrelational database systems that are also available under free and open source licenses (e.g., SQLite, MySQL, Postgresql, Cassandra).

Connectivity: While the internet existed already for more than 20 years, the introduction and rising popularity of the world wide web (WWW) was a catalyst for the exchange of information via electronic networks. This technology allows now billions of people on earth to have almost instant access to data. The speed of the internet connection, which is crucial for the exchange of information such as downloading large data sets, has also increased by at least two orders of magnitude since the middle of the 1990s when 56 kbit/s modems were the standard.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 2 The Lexis Diagram**

*To look at 20,000 numbers and draw out their meaning is a major research enterprise in itself. Yet on the methods used in [Vaupel et al. (1985a)] all that information is contained in a single contour map.*

Nathan Keyfitz in his foreword of Vaupel et al. (1985a).

Any dynamics in vital events such as births and deaths involve change over calendar time, age, and/or cohort. The so-called Lexis diagram represents the ideal canvas to illustrate such dynamics. The Lexis Diagram as we use it today consists of a Cartesian coordinate system where calendar time ("period") is depicted on the *x*axis and age on the *y*-axis (see Fig. 2.1 on page 6).<sup>1</sup> We added horizontal and vertical reference lines to facilitate orientation.

Birth cohorts move in such a diagram along the 45<sup>ı</sup> line since a person is 1 year later 1 year older. Expressed differently: The current age of a person can be calculated if we subtract the birth date from the current calendar date. We used the example of three eminent demographers of the twentieth century in Fig. 2.1 to illustrate this relationship: William Brass, Ansley Coale, and Nathan Keyfitz. To be able to follow the cohorts on the 45<sup>ı</sup> line, we made sure in Fig. 2.1—as well as in all other figures in this monograph—that the aspect ratio maps the length of one calendar year to exactly one age year.

Of course, we are not restricted to depict individuals on the Lexis plane. The standard approach is, indeed, to use population level data. It is obvious that we can not draw lines for every individual in that case. Colors are used instead to indicate the same value for the chosen statistic. While most figures in the remaining chapters show (smoothed) age-specific mortality or its time derivative, we opted to illustrate the basic approach of Lexis surface maps by depicting the population size of the

<sup>1</sup>It should be noted that the Lexis diagram can be considered to represent an example of "Stigler's law of eponymy" that states "No scientific discovery is named after its original discoverer." Please see Vandeschrick (2001) for a discussion about the problem of calling the diagram used in this book a "Lexis diagram".

**Fig. 2.1** An example of a Lexis diagram with individual life lines for William Brass, Ansley J.Coale, and Nathan Keyfitz

United States for women and men combined from 1900 until 2010 for ages 0–110 in Fig. 2.2. Thus, we have 111 - 111 D 12;321 individual datapoints. They are less than the 20,000 mentioned by Keyfitz in Vaupel et al. (1985a) but considerably more than the median number of entries in data matrices for statistical graphics found by Tufte (2003) in various scientific and non-scientific publications. Tufte—who was described as the "da Vinci of Data" by The New York Times (Deborah 1998) states in a related book (Tufte 2001, p. 166): "Data graphics should often be based on large rather than small data matrices and have a high rather low data density. More information is better than less information, especially when the marginal costs of handling and interpreting additional information are low, as they are for most graphics."

In our Lexis maps we employed a color scheme reminiscent of geographic maps where green colors indicate lower values and brown colors are used for high "altitudes". Analogously to standard maps, we added contour lines to emphasize areas of equal elevation, which translates to the same number of people in our figure.

**Fig. 2.2** An example of a Lexis surface depicting the population size of the United States by calendar year and age (Source: Own illustration based on data from the Human Mortality Database 2017)

Depicting mortality, fertility or other population characteristics in the Lexis diagram provides a useful framework to analyze data for the presence of age-, period-, and cohort- ("APC") effects. The major problem of standard statistical approaches (e.g., regression analysis) in this area is the so-called *Identification Problem*, which refers to the perfect correlation of age plus cohort equaling period. Various methods have been introduced (e.g., constraining the parameters in a regression setting) but "there is no magic solution" (Wilmoth 2006, p. 235).<sup>2</sup> With our surface maps, we suggest instead a graphical approach that can be used for questions such as "[w]hether mortality improvements takes place by cohorts or by periods" (Keyfitz in Vaupel et al. 1985a, p. ix).

<sup>2</sup>Please refer to this article also for a systematic overview of APC models used in demographic research.

**Fig. 2.3** "Ideal" age-, period-, and cohort-effects on the Lexis surface

Figure 2.3 gives an overview how age-, period-, and cohort effects would ideally look like on the Lexis surface. The same color indicates the same value in the variable of interest (e.g., death rates). The left panel represents "pure" age effects. That means that the only variation in the variable of interest takes place across the age dimension, regardless of calendar year or cohort. The panel in the middle denotes "pure" period effects, i.e., the same values are measured at all ages but they differ along the calendar time/period dimension ("Year"). Finally, the panel on the right illustrates how a surface map would like if (birth) cohorts alone were driving the development in the variable of interest. The same color along the 45<sup>ı</sup> line shows that each cohort has their own characteristic value of the variable of interest, which does not change throughout their life course. Obviously, those are idealized and simplified representations. We expect to find rather interactions of these three forces than such "pure" effects. Furthermore, we should acknowledge the biggest drawback of our method: In contrast to other methods of APC analysis, our visual approach does not attribute any numerical value to each of those effects. Hence, one can neither compare various effects with each other nor is it possible to conduct significance tests that are typical of regression analyses and other standard methods in statistics.

We are not the first to illustrate demographic phenomena in three dimensions, i.e., *either* on the Lexis plane using colors to indicate the third dimension *or* by wireframe plots. An interesting overview of the history of such "Frequency Surfaces and Isofrequency Lines" is given in Caselli and Vallin (2006). They cite the example of Luigi Perozzo's depiction of the change in the Swedish age pyramid in 1880, based on a diagram by Gerard Van Den Berg (1860), as one of those earliest examples. We have reproduced Perozzo's diagram in Fig. 2.4. About 60 years later, Pierre Delaporte used such wireframes to depict French mortality (1938) and contour lines for European mortality (1942).

An explicit case of using such plots to separate age-, period-, and cohort-effects from each other can be found in Thomas Pullum's article on US fertility published in 1980. A few years later, the population program at the International Institute

**Fig. 2.4** Change in the Swedish age pyramid as depicted by Luigi Perozzo in 1880 (Source: Timothy Riffe, with kind permission)

for Applied Systems Analysis (IIASA) in Laxenburg in Austria turned out to be an incubator for advancing the display of population dynamics on the Lexis plane in the 1980s. Vaupel, Yashin, Caselli, and others introduced colored/shaded contour maps to depict, for example, population size, mortality, or birth rates (e.g., Vaupel et al. 1985a,b, 1987; Caselli et al. 1985; Gambill and Vaupel 1985). The "democratization" effort described in the introductory chapter was also mirrored in the late 1990s for Lexis surfaces: Kirill Andreev developed not only the userfriendly software Lexis to analyze demographic trends in Denmark and other highly developed countries (Vaupel et al. 1997; Andreev 2002). He also shared it freely with anyone interested.3 Despite being a milestone for the creation of Lexis surface

<sup>3</sup>While writing his Master's thesis, the first author of this monograph received the Lexis software from Kirill Andreev simply via email in early 2000.

maps, almost no one is using it anymore. The aforementioned specialized languages such as Matlab (Mathworks 2017) or R (R Development Core Team 2015) have become the favorite tools nowadays along with Python (van Rossum 1995). With the exception of the reproduction of Perozzo's plot all figures in this monograph were created with R as we will explain in Sect. 3.2 and in the appendix, starting on page 161.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 3 Data and Software**

#### **3.1 Data**

#### *3.1.1 Human Mortality Database*

Most of our analyses are based on data from the Human Mortality Database ("HMD", 2017), which can be freely accessed after registration at http://www. mortality.org. The database is a collaborative project of research teams from the Department of Demography at the University of California, Berkeley (USA) and the Max Planck Institute for Demographic Research in Rostock (Germany). It contains *aggregate* mortality statistics such as death counts, population estimates, exposure to risk estimates, life tables as well as some other statistics of more than 35 countries (see Table 3.1). Further distinctions into sub populations are possible for some countries such as Germany (East and West Germany), the United Kingdom (England and Wales, Northern Ireland, Scotland) or New Zealand (Maori, Non-Maori). The database has its focus on highly developed countries.

Since its launch in 2002, the HMD has become the gold standard for the aggregate level (demographic) analysis of mortality. Apart from the diligent collection of data, its widespread adoption can mainly be attributed to two reasons: (1) Rigorous quality checks are conducted before new data are added to the database. (2) The biggest asset of the HMD is that it does not simply publish processed data. Instead, the HMD estimates life tables and other statistics itself using raw data, applying the same set of methods. Thus, any differences over time or across region can not be attributed to different methodologies, for instance, how the life table was closed (HMD 2007).

As some life tables in the HMD are smoothed at ages 80 and higher, we did not rely on life tables estimates at all but used exclusively the death counts and the corresponding exposures from the HMD on a 1-calendar-year by 1-age-year grid to estimate death rates. Most of our analyses deal with mortality developments


**Table 3.1** Countries covered in the Human Mortality Database and data coverage after 1950 on January 10th, 2017, when the most recent update of data was conducted for the present monograph

since 1950. We selected this threshold year because of the availability of more data compared to earlier time periods. Furthermore, it also marks the beginning of a new era: Most gains in life expectancy are nowadays due to survival improvements among the elderly (Christensen et al. 2009), a development, which was virtually non-existent before the middle of the twentieth century. Kannisto (1994), for instance, estimated that the onset of sustained decline in old-age mortality occurred for women in Switzerland, Belgium and Sweden in 1956.

As shown in Table 3.1 total deaths range from barely 100,000 (Iceland) to more than 130 million in the United States. We analyzed all countries; the only exceptions are Chile and the Maori population of New Zealand due to problematic data quality (Jdanov et al. 2008) and the low number of years covered (Chile). Nevertheless, we did not include those figures for all countries and both sexes as it would have resulted in a monograph consisting of hundreds of additional pages. We typically restricted ourselves, instead, to a few examples that feature interesting characteristics.

#### *3.1.2 Cause-Specific Death Counts in the United States*

The National Center for Health Statistics of the United States provides a unique collection: Individual death counts by sex, age at death, year of death, cause of death, and many more characteristics can be freely downloaded from its web page. The data are available since 1968 in annual files. Additionally, the website of the National Bureau of Economic Research (NBER) provides data since 1959, which we used in our analyses. The last year in our analysis is 2014. With the exception of 1972, when only a 50% sample was taken, each file contains all deaths in the United States. In the analysis by cause of death in later chapters of this volume, we simply multiplied the number of deaths for a given age, sex, and cause in the year 1972 by a factor of 2.

Causes of death are coded by the so-called "International Classification of Diseases" (ICD). Since its introduction in the late nineteenth century, the system has been revised at irregular intervals (Meslé 2006). The tenth revision is currently used. During the first years of our analysis, ICD-7 was used. ICD-8 was in effect in the United States between 1968 and 1978, followed by ICD-9 from 1979 until 1998.

Obtaining consistent time series of causes of death across ICD revisions requires meticulous work and care (e.g., Meslé and Vallin 1996; Pechholdová 2009). We therefore decided to use only very broad categories for causes of death and followed primarily the coding of Janssen et al. (2003) and of Meslé and Vallin (2006a). Both papers include an appendix with detailed ICD codes across the four revisions required in our analysis.

Table 3.2 is split into two halves. The upper panel provides the ICD codes we used to extract the causes of death, whereas the lower panel lists the number of deaths in absolute and relative terms for the selected causes by sex.

Our database consists of more than 118 million deaths. Although we have selected very few causes, they account for about three quarters of all deaths (Category 13 "Other" is 23.75%). A bit more than 44% of all deaths classified as originating from circulatory diseases. In that category, heart diseases are about one third of all deaths for women and men alike. The almost 10 million deaths from cerebrovascular diseases between 1959 and 2014 represent about eight percent of all deaths. The most common cerebrovascular disease is stroke. Malignant neoplasms ("cancer") are the second largest chapter in the ICD. Regardless of sex of the decedent, about one in every fifth death belongs to that category. We


**Table 3.2** ICD codes and counts (absolute and relative) for females, males, and both sexes combined selected causes of death, 1959–2014

selected three prominent cancer sites: Breast, lung and colorectum. Please note that while there are many more deaths from breast cancer for women, also more than 17,000 men died from it during the 56 years of our observation period. Respiratory diseases are with approximately 8% of all deaths slightly more common than cerebrovascular diseases. Although it is not a major cause of death (2%), we also included information about motor vehicle accidents since it turned out to be an interesting case study for seasonality in deaths, which we analyze in Chap. 9.

#### *3.1.3 SEER Cancer Register Data 1973–2011*

The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute of the United States allows researchers access to longitudinal data on the individual level about the incidence of cancer and includes also information about the survival of patients. The data coverage—the SEER data start in 1973—and the large size of data, combined with the ease of access, make the SEER data an ideal instrument for the analysis of cancer survival by age over calendar time. We were using data that were released in April 2014 with a follow-up cutoff date of December 31, 2011 (Surveillance, Epidemiology, and End Results (SEER) Program 2014). The SEER data do *not* cover all cancer diagnoses of the United States. It is a collection of data from several registries. With the exception of Seattle (Puget Sound) and Metropolitan Atlanta that started in 1974 and 1975, respectively, we only used registers that covered the whole time span from 1973 until the end of 2011. Although we use less data than we could have, we thought that a heterogeneous set of registers would have induced problems for the analysis over time. The registers included in our analysis were: San Francisco-Oakland SMSA, Connecticut, Metropolitan Detroit, Hawaii, Iowa, New Mexico, Utah as well as Seattle and Metropolitan Atlanta.

In our analysis of cancer survival in Chap. 10, starting on page 123, we selected five cancer sites: Breast cancer; cancer of the lung and bronchus; cancer of the colon, rectum, and anus; pancreatic cancer; prostate cancer. As shown in Table 3.3, those five cancer sites constitute about 55% of all cancer diagnoses for women as well as for men out of the 4.5 million cases recorded during our observation period. The largest categories are by far breast cancer for women (30.44%) and prostate cancer for men (25.79%). The absolute and relative frequencies of the other cancer sites as well as their respective ICD codes can be inspected from Table 3.3. While ICD-8 was in use at the beginning of the observation period in 1973 and cancer cases are typically coded by the ICD-O standard, all ICD codes were converted to ICD-10 by SEER.

#### **3.2 Software**

All analyses have been conducted and all figures have been produced using R (Version 3.2.3), a free software environment for statistical computing and graphics (R Development Core Team 2015). The surface maps were created by the image() function and contour lines were added with the contour() function. To facilitate the creation of surface maps of rates of mortality improvement for other researchers,


**Table 3.3** ICD-10 codes and incidence counts (absolute and relative) by cancer site of females, males, and both sexes combined in the SEER Data, 1973–2011

an R package called ROMIplot has been created and uploaded to CRAN, the general archive of R packages. Installation and usage of this package are explained in Appendix "Software: R package ROMIplot" (p. 161).

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 4 Surface Plots of Observed Death Rates**

#### **4.1 From Death Counts to Death Rates**

The basic units of any mortality analysis are death counts. In most scientific disciplines those counts are expressed as rates by dividing them by a unit of time. Examples are heart rates counting beats per minute or becquerel measuring the radioactive decay of nuclei per second. Things are more complicated when death counts are analyzed: For instance, 30,140 people died at age 80 in Germany in 2000. The corresponding number of Austria is 2,765 (HMD, 2017). Inferring that the risk of dying is more than ten-fold higher in Germany than in Austria is obviously wrong. Death rates are—as all demographic rates—therefore standardized dividing the counts by the corresponding number of life-years lived (see, for example Chap. 1.4 in Preston et al. 2001). The latter are often called "exposures" and are typically approximated by an estimate of the mid-year population. In the example above, the death rates at age *x* D 80 in year *t* D 2000, usually denoted as *m*.*x*; *t*/ would correspond to:

$$\text{Austria}: \quad m(\mathbf{x}, t) = \frac{D(\mathbf{x}, t)}{N(\mathbf{x}, t)} = \frac{2765}{42,070.77} = 0.06572259$$

$$\text{Germany}: \quad m(\mathbf{x}, t) = \frac{D(\mathbf{x}, t)}{N(\mathbf{x}, t)} = \frac{30140}{444,400.81} = 0.06782166$$

with death counts and exposures denoted as *D*.*x*; *t*/ and *N*.*x*; *t*/, respectively. Hence, mortality is still higher in Germany than in Austria but only by about three per cent and not by an order of magnitude. Death rates at those single ages *x*, that are used exclusively in this book, are often a good approximation for the continuous force of mortality at the middle of that age .*x* C 0:5/ (Thatcher et al. 1998). Nearly all of the analyses contained in this volume are based on such death rates.

#### **4.2 Results**

The raw surface plots on the following pages depict the observed death rates for women and men in a few selected countries. Death rates were estimated for single ages and single years from 1950 until the last available year in the Human Mortality Database, in most cases 2014 (see Chap. 3). Our color scheme ranges from blue to green to red. To facilitate interpreting the plots, we added contour lines for various levels of mortality similar to the ones for elevation on topographic maps. The levels of 1 death per 10 person-years lived, per 100 person-years lived, per 1,000 person-years lived, and per 10,000 person-years lived have been printed as bold lines as visual cues not because of any implicit distinct meaning apart from the digit preference.

Generally speaking, we do not think that raw surface plots are the best option to visualize mortality dynamics. That is why we only depict a few countries here. One of the main problems is that the observed rates suffer from random fluctuations. At young ages because death rates are so low; at older ages because there are so few people left. Thus, the numerator for the observed death rates is relatively small in the first case whereas the denominator is relatively small in the latter case.

What we can observe for Australian women and men in Figs. 4.1 and 4.2 is representative for many countries in the Human Mortality Database1: Most contour lines tend to move upwards over time. This indicates that the same level of mortality is being observed at higher and higher ages. Or, expressed differently, mortality is continuously decreasing at almost any given age. Switzerland and Spain in Figs. 4.3, 4.4, 4.5 and 4.6 are further examples of this general trend. It seems to be noteworthy that the late 1990s seems to be an important era for major improvements in mortality among young males.

We can already observe here the unfortunate mortality developments that took place in Russia (Figs. 4.7–4.8) as well as in many other eastern European countries

<sup>1</sup>See Figs. A.1–A.6 in the appendix for corresponding plots for France, England and Wales, and Norway.

**Australia, Women**

**Fig. 4.1** "Raw" death rates for women in Australia, 1950–2011 (Data source: Human Mortality Database)

**Australia, Men**

**Fig. 4.2** "Raw" death rates for men in Australia, 1950–2011 (Data source: Human Mortality Database)

**Spain, Women**

**Fig. 4.3** "Raw" death rates for women in Spain, 1950–2014 (Data source: Human Mortality Database)

**Spain, Men**

**Fig. 4.4** "Raw" death rates for men in Spain, 1950–2014 (Data source: Human Mortality Database)

**Switzerland, Women**

**Fig. 4.5** "Raw" death rates for women in Switzerland, 1950–2014 (Data source: Human Mortality Database)

**Switzerland, Men**

**Fig. 4.6** "Raw" death rates for men in Switzerland, 1950–2014 (Data source: Human Mortality Database)

**Russia, Women**

**Fig. 4.7** "Raw" death rates for women in Russia, 1959–2014 (Data source: Human Mortality Database)

**Russia, Men**

**Fig. 4.8** "Raw" death rates for men in Russia, 1959–2014 (Data source: Human Mortality Database)

(not shown here) that have been distinct from the rest of Europe: Irregular trends, especially among males, and even increasing mortality as depicted by the downward contour lines have been rather the rule than the exception between the 1960s and the early 2000s.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 5 Surface Plots of Smoothed Mortality Data**

#### **5.1 From Raw Death Rates to Smoothed Death Rates**

We have seen in the previous chapter, that "raw" death rates can suffer from considerable random fluctuations. Assuming that data quality is not an issue, this noise can be caused by (1) very few numbers of deaths (numerator), by (2) very few persons exposed to the risk of dying (denominator) or by (3) small populations in general. Problem (1) typically occurs at young ages. We selected age 15 in France in Panel (a) of Fig. 5.1. Despite a large population in general, deaths occur thankfully—relatively rarely at that age. (2) The opposite is true at advanced ages as shown in the middle panel of the same figure. Very few people are still alive at age 95 in Italy, although it is a large population having relatively high life expectancy. Problems (1) and (2) occur in countries with tens of millions of people only at young and old ages. The smaller the population size, the more ages are affected. Panel (c) illustrates issue (3) using Danish data. The mortality trajectory in highly developed countries is rather smooth around age 80. In countries with just a few millions of people, considerable random fluctuations can be even observed there. Please note that more than five million people live in Denmark. Hence, the challenge becomes even bigger in smaller countries such as the Baltic states, Luxembourg or, especially, in Iceland.

We decided therefore to smooth the data. Myriads of methods exist to smooth data. While the pattern over age can be appropriately captured by parametric models, the trajectory over time differs considerably between ages and countries. Our decision was therefore to use a non-parametric smoothing approach. We selected the so-called *P*-spline approach, originally developed by Eilers and Marx (1996), adapted to the analysis of mortality by Currie et al. (2004) and further refined by Camarda (2008). The author, Carlo Giovanni Camarda, also provides the R extension package "MortalitySmooth" (Camarda 2012), which makes it easy and straightforward to apply the method. At its core, the model assumes

**Fig. 5.1** The necessity to smooth raw death rates. Using data for France, Italy, and Denmark, panel (**a**), (**b**) and (**c**) illustrate three sources of random fluctuations: few numbers in the numerator (Panel (**a**) for age 15), few numbers in the denominator (Panel (**b**) for age 95) or small population sizes in general (Panel (**c**) for age 80) (Data source: Human Mortality Database)

Poisson distributed death counts with the (log-)exposures as an offset to account for changing population sizes over time and/or age. The method uses *B*-splines as regression bases. Whereas the number and position of the basis functions is crucial for standard smoothing with *B*-splines, the *P*-spline approach uses "too many" bases, which would normally result in overfitting. The *P* in the name of the method refers to the penalization of adjacent regression coefficients that differ too much from each other. Further technical details about the basis functions, the order of the differences, the penalty term , etc. are extensively discussed in the aforementioned references. The bold solid black lines in each panel of Fig. 5.1 depict the data smoothed with *P*-splines for the three given ages over time. One can easily recognize that the selected smoothing method is flexible enough to model irregular developments but is not prone to overfit the data.

The univariate time series of Fig. 5.1 is synthetic. Only cartoon characters such as Bart Simpson or Eric Cartman can retain their age over time. In reality, each individual is 1 year later 1 year older. Therefore we smoothed the data simultaneously over age and time using the function Mort2Dsmooth of Camarda's package "MortalitySmooth" (2012).

Raw death rates for Estonian women aged 60–80 years from 1980 to 2000 are illustrated in the left panel of Fig. 5.2 as a three-dimensional mortality surface. The general shape of increasing mortality over age can easily be observed. The right panel, featuring smoothed data, also shows the decline in mortality at higher ages over time, which is difficult to track down in the presence of noise in the data.

**Fig. 5.2** 3D plot of raw and smoothed death rates of Estonian women aged 60–80 years in 1980– 2000 (Data source: Human Mortality Database)

The selected three-dimensional perspective plot appears appealing at first sight. The choice of angle and elevation is somehow arbitrary, though, and allows to accentuate certain features and suppress others. Since we often want to use the mortality surface for exploratory purposes, we have to give equal exposure to each unit. Therefore, we projected the three-dimensional data on the two-dimensional Lexis-plane, denoting the level of mortality by different colors (see Fig. 5.3 as an example).

Comparable to topographic maps, we added contour lines to depict the same levels of mortality. The general upward tendency of the contour lines indicate that the same level of mortality is shifting to higher and higher ages. Thus, for a given age mortality is decreasing, resulting in an increase in life expectancy.

#### **5.2 Results**

Figures 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 5.10, and 5.11 depict the same set of countries as Figs. 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, and 4.8 in Chap. 4 for a proper comparison between "raw" rates and smoothed rates.<sup>1</sup> The smoothed surface maps make the major trends in the data more pronounced such as almost parallel straight upward

<sup>1</sup>The appendix contains therefore also maps of smoothed death rates for France, England & Wales, and Norway. They can be found in Figs. A.7, A.8, A.9, A.10, A.11, and A.12.

**Estonia, Women**

**Fig. 5.3** Death rates of Estonian women aged 60–80 years in 1980–2000 as an example of smoothed death rates on the Lexis plane (Data source: Human Mortality Database)

lines in Australia, Spain, and Switzerland or the sudden survival improvements in survival among young Spanish men, starting in about 1990. Also large random fluctuations due to very few deaths as we have seen in the plot of raw death rates among children in Switzerland (Figs. 4.5 and 4.6) are removed by the smoothing procedure. While smoothing intrinsically involves some dampening of sudden changes in trends, the automatic procedure to find the optimal penalizing s still

**Australia, Women**

**Fig. 5.4** Smoothed death rates for women in Australia, 1950–2011 (Data source: Human Mortality Database)

**Australia, Men**

**Fig. 5.5** Smoothed death rates for men in Australia, 1950–2011 (Data source: Human Mortality Database)

**Switzerland, Women**

**Fig. 5.6** Smoothed death rates for women in Switzerland, 1950–2014 (Data source: Human Mortality Database)

**Switzerland, Men**

**Fig. 5.7** Smoothed death rates for men in Switzerland, 1950–2014 (Data source: Human Mortality Database)

**Spain, Women**

**Fig. 5.8** Smoothed death rates for women in Spain, 1950–2014 (Data source: Human Mortality Database)

**Spain, Men**

**Fig. 5.9** Smoothed death rates for men in Spain, 1950–2014 (Data source: Human Mortality Database)

**Russia, Women**

**Fig. 5.10** Smoothed death rates for women in Russia, 1959–2014 (Data source: Human Mortality Database)

**Russia, Men**

**Fig. 5.11** Smoothed death rates for men in Russia, 1959–2014 (Data source: Human Mortality Database)

feature, for instance, the mortality crises among Russian men during the 1980s and 1990s. We do not want to go into further detail here as these smoothed surface maps serve as the major building blocks for the surface maps of rates of mortality improvement, which are the focus of our book and are presented in the next chapter.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 6 Surface Plots of Rates of Mortality Improvement**

#### **6.1 From Smoothed Death Rates to Rates of Mortality Improvement**

The colors and contour lines in Fig. 5.3 suggest also a change in pace over time: Each level of mortality seems to change its slope in the early 1990s. We argue that those trend changes are better illustrated with "rates of mortality improvement", which we labeled "ROMIS", than with (smoothed) surface maps of mortality. Given death rates at age *x* in year *t*, *m*.*x*; *t*/, we defined the rates of mortality improvement, , by assuming a constant rate of change within the period of comparison. In this monograph, we only used annual changes. Hence:

$$\rho(\mathbf{x},t) = -\log\_{\epsilon}\left(\frac{m\left(\mathbf{x},t+1\right)}{m\left(\mathbf{x},t\right)}\right)$$

It is simply a reformulation of the standard equation for growth with a constant rate *r*: *P*.*t*/ D *P*.0/*ert* (e.g., Keyfitz 1977). The minus sign ensures to have positive numbers for survival improvements. We expressed the respective values for in percent. It is comparable to Kannisto et al. (1994) who used a discrete version of the growth equation and aggregated several ages and years.

Figure 6.1 illustrates those ROMIS again with data for Estonian women. To provide a more comprehensive overview, we expanded the age range as well as calendar time. No change or negligible changes (0:5% 0:5%) are depicted in white. Slight improvements (0:5% < 2:0%) are shown in three shades of blue, larger improvements in green colors (2:0% < 4:0%) and very strong improvements ( > 4:0%) in red colors and yellow. If mortality increased, i.e., the survival conditions worsened, we used darker shades of gray for larger mortality increases. Please note that an annual change of D 0:035 D 3:5% cuts mortality

**Estonia, Women**

**Fig. 6.1** Example of rates of mortality improvement on the Lexis plane: Estonian women aged 0 to 100 years in 1959–2012 (Data source: Human Mortality Database)

in half in less than 20 years.1 But even at D 2%, which we listed at the threshold from moderate to strong improvements, it requires less than 35 years for a reduction by 50%.

How can we interpret Fig. 6.1, which could be mistaken for a piece of modern art at a first glance? The main shapes appear to be vertical. This implies that mortality changes affected virtually all age groups at the same moment in time—classical period effects. We can also see that white and gray are the dominant colors for females in Estonia for the 1970s and the 1980s. Thus, mortality remained more or less constant during those two decades. During the 1980s at ages 35–60, we can even spot some dark gray areas that correspond to increasing levels of mortality. We can witness a trend reversal approximately in 1990. Within a couple of years, Estonian women at almost all ages experienced remarkable survival improvements. The colors illustrate that mortality dropped by more than 4% for several years at some ages. At such a rapid pace, it takes about 10 years to cut mortality by a third.

#### **6.2 Results**

Figures 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 6.10, 6.11, 6.12, 6.13, 6.14, 6.15, 6.16, 6.17, 6.18, 6.19, 6.20, and 6.21 (pages 46–65) depict Lexis diagrams of rates of mortality improvements ("ROMIs"), which are the time derivative of age-specific death rates. We argue that those maps are better able to illustrate mortality dynamics than the commonly used "heat maps" of mortality. We plotted our first ROMIs on the Lexis surface about 10 years ago (Rau et al. 2008). In the meantime, those plots have become more commonplace, especially among actuaries, to visualize mortality dynamics. Our method can be considered as a descriptive tool. It is able to detect the predominant dynamics of mortality (or of any other phenomenon measured on the Lexis surface). We think that those "ROMI"-maps provide better insights into mortality dynamics than standard surface maps but are equally intuitively understandable.

During the 1950s, the first years of our observation period, survival improved tremendously especially for infants, children, and young adults. The most remarkable declines in mortality were recorded for Japanese females (Fig. 6.14, page 58). After the end of World War II, life expectancy in Japan was below the average of western European countries. According to data from the Human Mortality Database, life expectancy for Japanese females rose from 60.9 years in 1950 to 72.3 in 1963. Thus, life expectancy increased by almost 1 year within each calendar year during that time span! But also France (Fig. 6.9, p. 53), Italy (Fig. 6.13, p. 57), England & Wales (Fig. 6.7, p. 51) or the United States (Fig. 6.21, p. 65), to name only a few, gained several years of life due to mortality declines at younger ages.

<sup>1</sup>0:5*m*.*x*; *t*/ D *m*.*x*; *t*/*e <sup>t</sup>* I 0:5 D *e <sup>t</sup>* I log*<sup>e</sup>* .0:5/ D *t*I log*<sup>e</sup>* .0:5/ =I log*<sup>e</sup>* .0:5/ =0:035 D -19:80421.

**Fig. 6.2** Rates of mortality improvement for women in Australia, 1950–2010 (Data source: Human Mortality Database)

#### **Australia, Women**

**Fig. 6.3** Rates of mortality improvement for women in Austria, 1950–2013 (Data source: Human Mortality Database)

**Austria, Women**

**Fig. 6.4** Rates of mortality improvement for women in Belarus, 1950–2013 (Data source: Human Mortality Database)

#### **Belarus, Women**

**Czech Republic, Women**

**Fig. 6.5** Rates of mortality improvement for women in Czech Republic, 1950–2013 (Data source: Human Mortality Database)

**Fig. 6.6** Rates of mortality improvement for women in Denmark, 1950–2013 (Data source: Human Mortality Database)

#### **Denmark, Women**

**England & Wales, Women**

**Fig. 6.7** Rates of mortality improvement for women in England & Wales, 1950–2012 (Data source: Human Mortality Database)

**Finland, Women**

**Fig. 6.8** Rates of mortality improvement for women in Finland, 1950–2014 (Data source: Human Mortality Database)

**Fig. 6.9** Rates of mortality improvement for women in France, 1950–2013 (Data source: Human Mortality Database)

**France, Women**

**Fig. 6.10** Rates of mortality improvement for women in western Germany, 1956–2012 (Data source: Human Mortality Database)

**Germany (East), Women**

**Fig. 6.11** Rates of mortality improvement for women in eastern Germany, 1956–2012 (Data source: Human Mortality Database)

**Hungary, Women**

**Fig. 6.12** Rates of mortality improvement for women in Hungary, 1950–2013 (Data source: Human Mortality Database)

**Fig. 6.13** Rates of mortality improvement for women in Italy, 1950–2011 (Data source: Human Mortality Database)

**Italy, Women**

**Japan, Women**

**Fig. 6.14** Rates of mortality improvement for women in Japan, 1950–2013 (Data source: Human Mortality Database)

**Netherlands, Women**

**Fig. 6.15** Rates of mortality improvement for women in Netherlands, 1950–2011 (Data source: Human Mortality Database)

#### **Poland, Women**

**Fig. 6.16** Rates of mortality improvement for women in Poland, 1958–2013 (Data source: Human Mortality Database)

**Fig. 6.17** Rates of mortality improvement for women in Russia, 1959–2013 (Data source: Human Mortality Database)

**Russia, Women**

**Russia, Men**

**Fig. 6.18** Rates of mortality improvement for men in Russia, 1959–2013 (Data source: Human Mortality Database)

**Spain, Women**

**Fig. 6.19** Rates of mortality improvement for women in Spain, 1950–2013 (Data source: Human Mortality Database)

**Ukraine, Women**

**Fig. 6.20** Rates of mortality improvement for women in Ukraine, 1959–2012 (Data source: Human Mortality Database)

**Fig. 6.21** Rates of mortality improvement for women in USA, 1950–2013 (Data source: Human Mortality Database)

**USA, Women**

Another vertical pattern, suggesting a period effect, can be observed in many countries during the 1970s. Among the countries presented here, Australia (p. 46), Finland (p. 52), western Germany (p. 54), Spain (p. 63) and the United States (p. 65) belong to that group, for instance. We can only speculate that the so-called "cardiovascular revolution" (Meslé and Vallin 2006b) played an important role. It was during the 1970s that medical procedures such as bypass surgery, pace makers to treat cardiovascular diseases were introduced to larger parts of the population. But it was not only the treatment but also the prevention of cardiovascular diseases by drugs such as beta blockers that received a major boost during that time frame.

Many countries that benefited from that period effect during the 1970s exhibit a pattern that resembles a cohort effect in the years thereafter for persons aged approximately 40–80 in the 1970s. It could be argued that those green and red colors along the 45<sup>ı</sup> line that last into the 2000s could be interpreted as a protective effect for those cohorts that benefited first from the new treatment and prevention methods during the 1970s. Please note that this does not imply that subsequent cohorts did not benefit from the advances of the 1970s. This would have resulted in gray cohorts areas. Instead we typically encounter positive developments, just at a smaller scale than the ones of the initial cohorts. This pattern is most visible for Japan (p. 58), Spain (p. 63), Finland (p. 52) and Australia (p. 46), and—to a lesser degree—in France (p. 53) and western Germany (p. 54).

This period effect followed by a cohort effect is not a universal finding, however. Even among western European countries, we detect some outliers. The most prominent example is probably the case of Danish women (p. 50). While the past 20 years or so have shown moderate to strong survival improvements across most of the age range as indicated by the green and red colors, there is one issue that sets Denmark apart from other countries: A cohort effect from the 1960s that lasted well into the early 1990s with stagnating survival, shown in white, or even increasing mortality as suggested by the gray shades. It has been now conclusively shown that Danish women born between the two world wars and their relatively high smoking prevalence are at the root of this cohort effect (e.g., Jacobsen et al. 2002, 2004, 2006; Lindahl-Jacobsen et al. 2016). This cohort effect coincides with relatively minor life expectancy gains among Danish women during that period. Also the United States (p. 65) features a strange pattern. It will be investigated further when we analyze rates of mortality improvement for selected causes of death in Chap. 7.

Similar to the Danish situation, modest life expectancy gains or even losses during the 1970s and 1980s have also been observed in several eastern European countries. But it has not been caused by a cohort effect as the vertical shapes for Hungary (p. 56), the Czech Republic (p. 49), Poland (p. 60) or the former GDR (p. 55) indicate a clear period effect. It can be rather expected that those countries could not (yet) reap the benefits of the cardiovascular revolution that many western countries experienced during that time period. This is supported by the subsequent strong period effects in many of those countries. The most prominent example is probably the former GDR/eastern Germany. When Germany re-unified, there was a difference of almost 3 years among women for life expectancy at birth. Just 15 years later, the difference virtually disappeared for females between the two parts of Germany. Germany's Federal Health Reporting database (www.gbe-bund.de) can be queried to show that mortality of the circulatory system declined by 47% in eastern Germany between 1990 and 2005.

The most turbulent mortality history during the last 60 years has been probably experienced by Russia and other former Soviet republics (see Figs. 6.4, 6.17, and 6.20 on pages 48, 61, and 64). Since the 1960s, those countries (or then parts of the USSR) have seen sudden changes in mortality spikes and subsequent survival improvements. Those were typically period effects as the vertical patterns in those figures indicate. While we have only focused on mortality of women, we included the case of Russian men in Fig. 6.18 on page 62. There were a few years featuring survival improvements for instance during the mid 1980s, coinciding with Gorbachev's anti-alcohol campaign (Leon et al. 1997), life expectancy of Russian men declined by more than 5 years between 1965 and 2000. France Meslé (2004) points out in her decomposition analysis, that the majority of life years lost was due to increasing mortality from circulatory diseases and violent deaths. Those are precisely the causes, which are mainly responsible for the increase in life expectancy during the first decade of the 2000s: "Our analyses have shown that the recent improvements in life expectancy have mainly been driven by reductions in mortality from circulatory diseases and external causes" (Shkolnikov et al. 2013, p. 930).

The last few years of our observation period provide a mixed result. Life expectancy continued to increase for Russian men, primarily caused by annual survival improvements of more than 3% at ages 70 and above. Mortality declined modestly between ages 40 and 70. And there are some ages between 35 and 40 where mortality increased slightly again. But it is too early to determine whether we see another trend reversal.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 7 Surface Plots of Rates of Mortality Improvement for Selected Causes of Death in the United States**

The current chapter shows how surface maps of rates of mortality improvement can also be used to analyze causes of death. This might enable researchers to gain better insights into the underlying mortality dynamics than merely looking at the Lexis surface of rates of improvement for all-cause mortality. We selected the United States for two reasons:


We used again the same techniques and color schemes as in Chap. 6. To avoid any spurious conclusions due to small numbers of deaths, we excluded deaths above age 95 and below age 20.

**Rates of Mortality Improvement, Circulatory Diseases, Women**

**Fig. 7.1** Rates of mortality improvement for all circulatory diseases for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

More than 50 mio. deaths—corresponding to almost 45% of all deaths—can be attributed to diseases of the circulatory system. The ROMI plot for mortality due to these causes is depicted in Fig. 7.1. Heart diseases (Fig. 7.2), e.g., myocardial infarction, and cerebrovascular diseases (Fig. 7.3) such as stroke constitute about

**Rates of Mortality Improvement, Heart Diseases, Women**

**Fig. 7.2** Rates of mortality improvement for heart diseases for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

**Rates of Mortality Improvement, Cerebrovascular Diseases, Women**

**Fig. 7.3** Rates of mortality improvement for cerebrovascular diseases for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

95% of all deaths from circulatory diseases. We can draw at least two conclusions from those figures:


So if circulatory diseases were the main reason for life expectancy gains in many European countries during the 1980s and 1990s, why did life expectancy in the United States not increase in a similar manner since mortality from heart diseases, stroke and similar causes also declined remarkably in the US?

If circulatory diseases can be excluded, we turned our attention to malignant neoplasms ("cancers"). They are responsible for more than one in five deaths. Among the various cancer sites, we decided to look at three major sub-categories: colorectal, breast and lung cancer (Figs. 7.5, 7.6, 7.7, and 7.8) in addition to mortality from all cancers (Fig. 7.4).

Deaths from any kind of cancer for women (Fig. 7.4) show a mixed pattern: Below age 50 we can detect a continuous trend of improving survival conditions throughout most of our observation period. Lower mortality from cancer extends also to higher and higher ages after the mid-1980s (Fig. 7.4). Those survival improvements that show some characteristics of a cohort effect could be influenced by declining mortality from colorectal cancers as suggested by Fig. 7.5. Also breast cancer (Fig. 7.6) displays steady improvements albeit starting only in the 1990s. The main cause for the poor development of female life expectancy during the late twentieth century is probably lung cancer. Among the authors of this book, Fig. 7.7 on page 77 is the strongest cohort effect they have encountered when analyzing rates of mortality improvement by cause of death. Also men (Fig. 7.8, p. 78) feature such a strong cohort effect. The pattern for males is located further left on the Lexis map, i.e., earlier in calendar time, supporting the idea of the "'cigarette diffusion' explanation [. . . ] that convergence in male and female smoking is the byproduct of a female lag in the process of cigarette adoption, diffusion, and abatement" (e.g., Pampel 2001, p. 388). Furthermore, our figures on lung cancer, in conjunction with the detrimental effects shown in Fig. 7.9 for respiratory diseases, are in line with

**Rates of Mortality Improvement, All Cancers, Women**

**Fig. 7.4** Rates of mortality improvement for malignant neoplasms for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

**Rates of Mortality Improvement, Colorectal Cancer, Women**

**Fig. 7.5** Rates of mortality improvement for colorectal cancer for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

**Rates of Mortality Improvement, Breast Cancer, Women**

**Fig. 7.6** Rates of mortality improvement for breast cancer for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

**Rates of Mortality Improvement, Lung Cancer, Women**

**Fig. 7.7** Rates of mortality improvement for lung cancer for women in the United States aged 20– 95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

**Rates of Mortality Improvement, Lung Cancer, Men**

**Fig. 7.8** Rates of mortality improvement for lung cancer for men in the United States aged 20– 95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

**Rates of Mortality Improvement, Respiratory Diseases, Women**

**Fig. 7.9** Rates of mortality improvement for respiratory diseases for women in the United States aged 20–95 between 1959 and 2013 (Data source: Human Mortality Database, National Center for Health Statistics, and National Bureau of Economic Research)

Wang and Preston (2009, p. 398) who argue that "[b]ecause of changes in smoking behavior that have already occurred or that can be reliably projected, American mortality is likely to fall more rapidly than is commonly anticipated."

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 8 Surface Plots of Age-Specific Contributions to the Increase in Life Expectancy**

#### **8.1 How to Estimate Age-Specific Contributions to the Change in Life Expectancy**

Different perspectives can provide different insights into mortality dynamics. Chapters 6 and 7 investigated the relative change of death rates over time. The same "ROMI" does not necessarily translate to the same change of life expectancy, though, neither over time nor at different ages: A large reduction of infant mortality in the past had a major impact on life expectancy whereas the same proportional reduction would affect life expectancy only slightly since infant mortality is already (and thankfully) at a very low level. Analogously, the same rate of mortality improvement at the same time may have considerably different effects on life expectancy. For instance, an annual mortality decline by *x* per cent at age 80 has a much larger impact on life expectancy than a decline by *x* per cent at age 100.

We decided therefore to estimate the age-specific contributions to the change in life expectancy. Among the various methods available—see Canudas-Romo (2003) for an overview—we applied the approach of Arriaga (1984) using the exposition of Preston et al. (2001, pp. 64–65). Having data for single ages available, allowed us to further simplify the notation. With the conventional life table *lx* for the life table survivors at age *x*, *Lx* for the number of life years lived at age *x*, and *Tx* for the number of life years lived at age *x* and above, we can estimate *x*, the contribution of mortality at age *x* to differences in life expectancy between two points (or between any two life tables), denoted by superscripts <sup>1</sup> and <sup>2</sup> as:

$$
\Delta\_{\mathbf{x}} = \frac{l\_x^1}{l\_0^1} \cdot \left(\frac{L\_{\mathbf{x}}^2}{l\_x^2} - \frac{L\_{\mathbf{x}}^1}{l\_x^1}\right) + \frac{T\_{\mathbf{x}+1}^2}{l\_0^1} \left(\frac{l\_x^1}{l\_x^2} - \frac{l\_{\mathbf{x}+1}^1}{l\_{x+1}^2}\right),
$$

The age-specific contribution to the difference in life expectancy, *x*, consists of two parts. The first part (until the C sign) estimates the *direct* effect, i.e., the change in life expectancy only due to the change in mortality at this given age *x*. The second part is the sum of an indirect effect and an interaction effect (Preston et al. 2001, p. 64). A geometric explanation might help to understand what is meant by this second component as it often appears to be confusing: Life expectancy can be interpreted as the area under the survival curve. In case of a decline in mortality at age *x* at time point *t* the survival curve at this age is higher than at time point *t* 1. This is meant by the direct effect. For the sake of simplicity, let's assume that mortality only changed at age *x*. Nevertheless, the survival curve will be also higher at age *x* C1: A survival curve where the survival at age *x* C1 was at the same level as at *t* 1 would require an increase in mortality. This "wake" of a change in mortality at one age affecting the survival function at subsequent ages is estimated by the second component.

We followed exactly the same procedure as in Chap. 5 to obtain the required death rates: Raw death rates, based on death counts and corresponding exposure times from the Human Mortality Database (HMD, 2017), were smoothed assuming Poisson distributed death counts using Camarda's MortalitySmooth package (Camarda 2012, 2015). The life table functions *lx*, *Lx*, and *Tx* were estimated using the approach outlined in Chapter 3 of Preston et al. (2001). The values for *ax*, the mean duration lived at age *x* by those who died at age *x*, were taken from the HMD.

While any kind of difference in calendar time could be used, we decided to estimate the age-specific contributions within an interval of ten years. I.e., we compared 1960 to 1950, 1961 to 1951, . . . . The years on the *x*-axis in Figs. 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8, 8.9, 8.10, 8.11, 8.12, and 8.13 refer to the *latter* time point. Thus, the values at any age *x* in year 1980 denote the contribution of changing mortality at age *x* between 1970 and 1980. The choice of a ten year difference is, of course, arbitrary but it allowed us also to express the contribution in "meaningful" units: We used days and weeks and—in exceptional cases of substantial improvements or deterioriation in survival—months. The surface maps were plotted using a terrain color scheme: Green indicates moderate contributions to life expectancy. When the color turns to brown, that age alone contributed at least one week to the increase in life expectancy during the decade of observation. Very bright brown areas depict contributions of one month or more. Blue colors denote negative contributions. Just like deeper shades of blue suggests lower depths below sea level on geographic maps, they indicate here changes in age-specific mortality that bring life expectancy down.

Again, we have not included the whole set of countries from the HMD but rather a subset of countries with rather peculiar features, which we already pointed at in previous chapters.

**Belarus, Men: Contribution of Single Ages** 

**Fig. 8.1** Age-specific contributions to the increase in life expectancy among men during the past 10 years in Belarus, 1969–2014 (Data source: Human Mortality Database)

**Denmark, Women: Contribution of Single Ages** 

**Fig. 8.2** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Denmark, 1960–2014 (Data source: Human Mortality Database)

**Fig. 8.3** Age-specific contributions to the increase in life expectancy among women during the past 10 years in France, 1960–2014 (Data source: Human Mortality Database)

**Germany (East), Women: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.4** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Germany (East), 1966–2013 (Data source: Human Mortality Database)

**Germany (East), Men: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.5** Age-specific contributions to the increase in life expectancy among men during the past 10 years in Germany (East), 1966–2013 (Data source: Human Mortality Database)

**Germany (West), Women: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.6** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Germany (West), 1966–2013 (Data source: Human Mortality Database)

**Japan, Women: Contribution of Single Ages** 

**Fig. 8.7** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Japan, 1960–2014 (Data source: Human Mortality Database)

**Netherlands, Women: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.8** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Netherlands, 1960–2012 (Data source: Human Mortality Database)

**Poland, Women: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.9** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Poland, 1968–2014 (Data source: Human Mortality Database)

**Poland, Men: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.10** Age-specific contributions to the increase in life expectancy among men during the past 10 years in Poland, 1968–2014 (Data source: Human Mortality Database)

**Russia, Men: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.11** Age-specific contributions to the increase in life expectancy among men during the past 10 years in Russia, 1969–2014 (Data source: Human Mortality Database)

**Sweden, Women: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.12** Age-specific contributions to the increase in life expectancy among women during the past 10 years in Sweden, 1960–2014 (Data source: Human Mortality Database)

**USA, Women: Contribution of Single Ages to the Increase in Life Expectancy Over a Period of 10 Years**

**Fig. 8.13** Age-specific contributions to the increase in life expectancy among women during the past 10 years in USA, 1960–2014 (Data source: Human Mortality Database)

#### **8.2 Results**

Figures 8.1 and 8.11 for Belarus and Russia, respectively, reiterate our findings of strong period effects from Chap. 6. Although our focus is mainly on mortality dynamics of women, we selected data for men here on purpose since the decline in life expectancy and fluctuations over time were more pronounced for males than for females (e.g., Meslé 2004).

The vertical ROMI patterns in Fig. 6.18 (Chap. 6, p. 62) suggested that all ages between 15 and 75 were affected by the strong positive and negative period effects in Russia. Figure 8.11 in the present chapter, though, allows us to narrow down the age-range if we are interested in the contribution to changes in life expectancy. Compared to ten years earlier, changing mortality of men aged between 20 and 50 years appears to be the main contributor to the increase in life expectancy during the 1980s, fueled at least partly by Gorbachev's anti-alcohol campaign (Leon et al. 1997). As we have already seen in Chap. 6, the end of the Soviet Union in the early 1990s induced a major rise in mortality in Russia and other successor states. It seems almost impossible that mortality increased as much at ages 50 to 65 in Belarus (Fig. 8.1) that some single ages depressed life expectancy by one month or more within a ten-year interval. Even more astonishing are the results for Russia (Fig. 8.11) where the change in mortality at single ages between 40 and 55 caused a decline of life expectancy of six weeks and more.

The end of socialism/communism in eastern Europe in the early 1990s was less of a problem for Poland, though, serving as an example of a country from the former Warsaw Pact (see Fig. 8.9 for women and Fig. 8.10 for men). Whereas mortality also increased for men at working ages throughout the 1970s and 1980s, it took only a few years after the fall of the iron curtain, to see exactly the same kind of ages contributing two weeks or more to gains in life expectancy throughout a decade—a development, which appears to be still ongoing. It took even less time for Polish women to benefit from the regime change than for their male peers. The increase in life expectancy almost immediately after 1989/1990 was primarily triggered by survival improvements among women aged 60–85 years.

A similar picture as for Poland emerges for females and males from the former "GDR" (Figs. 8.4 and 8.5): Stagnating or even increasing mortality throughout the 1970s and 1980s among men at working ages does not immediately disappear with the end of the political regime. Indeed, mortality even increased slightly for males aged about 30–50 years. Marc Luy (2004, p. 133) showed that "[t]his effect can be attributed almost exclusively to diseases of the digestive system (mainly due to diseases of the liver) and the cause of death chapter 'injury, poisoning and certain other consequences of external causes' (mainly resulting from traffic accidents)." The group that was the fastest to adapt to the new situation were German women from the former eastern part. Faster than Polish women or men from eastern Germany, improvements in survival started immediately in 1990. Declining mortality where single ages contributed at least two weeks to the increase in life expectancy within ten years were representative of the first two decades after Germany's reunification. Women aged 65 to 80 contributed even one month or more from the mid-1990s to the mid-2000s.

A contribution of two weeks or more of single ages to the increase in life expectancy was already common among women in the former western part of Germany since the 1970s (Fig. 8.6). In fact, the peak of one month and more for women aged 65 to 80 could be interpreted as an indicator for the catching-up period of the "cardiovascular revolution" that already started in the 1970s in the former FRG and many other western countries—see, for instance, the figures for French and Swedish women in Figs. 8.3 and 8.12.

Sweden was actually one of the first countries with a sustained decline in old-age mortality as pointed out by Kannisto (1994). As reflected by the narrowing bands of two weeks and more in Fig. 8.12, contributions of older ages to the increase in life expectancy have been smaller than in some other "vanguard" countries such as France or Japan. Drefahl et al. (2014) demonstrate that different trends for mortality from circulatory diseases were the main reason that Sweden is "losing ground".

Once again we can detect a clear cohort effect in Denmark (Fig. 8.2) for the women born between the two world wars. While the blue colors indicate worsening survival, the detrimental effects were just a few days at most for single ages, much less than what we observed for Belarus or Russia (Figs. 8.1 and 8.11). As we have shown in previous chapters, the United States (Fig. 8.13) also deviated negatively from the international trend observed in many western countries. We can expect that the seemingly interrupted pattern between 1980 and 2000 can be attributed to the severe effects of lung cancer, which we demonstrated in Chap. 7.

Another country where life expectancy improvements were not as high as anticipated during the 1980s and 1990s were the Netherlands (Fig. 8.8). The typical explanation of the smoking epidemic and lung cancer does not hold here, though. Peters (2015, p. 185), for example, argues that "[t]he internationally deviating Dutch trends over the past three decades are not explained by changes in the impact of smoking. Accounting for the impact of smoking revealed simultaneous trend breaks in mortality decline of Dutch men and women around 2002. These breaks occurred most likely due to sudden changes in healthcare expenditures that explained about half of the acceleration in life expectancy during 2000–2009."

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 9 Seasonality of Causes of Death**

#### **9.1 Decomposing Seasonal Data**

The majority of deaths in most countries can be attributed to causes that feature a distinct seasonal pattern. Figure 9.1 depicts the relative monthly frequencies of nine selected causes of death in the United States for women and men combined for the years 1959–2014. The reported number of counts in parentheses in the title of each panel is the actual number of deaths. To control for varying lengths of months, the monthly columns in each histogram have been adjusted for a uniform length (30 days). The horizontal reference lines denote the expected value of a uniform distribution (=1/12).

The typical distribution follows a sinusoidal pattern with highest mortality in winter and relatively few cases in the summer. Primarily, those are circulatory diseases (e.g., heart diseases, cerebrovascular diseases)—as shown in the first row of Fig. 9.1—and respiratory diseases such as chronic obstructive pulmonary disease ("COPD"), pneumonia or influenza (Eurowinter Group 1997, 2000; Mackenbach et al. 1992; Kunst et al. 1990; Rau 2007; Yen et al. 2000; Seretakis et al. 1997), which are displayed in three horizontal panels in the middle of Fig. 9.1.

If diseases, and ultimately, mortality occur seasonally, it has been argued that "an environmental factor has to be considered in the etiology of that disease" (Marrero 1983, p. 275).1 The main environmental factor to trigger higher mortality during winter for circulatory diseases and respiratory diseases—the rows on top of

<sup>1</sup>It should be noted that the impact of environmental factors on diseases and deaths is as not a finding of the latter part of the twentieth century but is well known for more than 2000 years. In about 400BC Hippocrates started his treatise "On Airs, Waters, and Places" with the following words: "Whoever wishes to investigate medicine properly, should proceed thus: in the first place to consider the seasons of the year, and what effects each of them produces for they are not at all alike, but differ much from themselves in regard to their changes."

R. Rau et al., *Visualizing Mortality Dynamics in the Lexis Diagram*, The Springer Series on Demographic Methods and Population Analysis 44, DOI 10.1007/978-3-319-64820-0\_9

**Fig. 9.1** Seasonality of selected causes of death in the United States for both sexes combined for the years 1959–2014. The counts reported in each panel denote the actual numbers of death. The relative frequencies in each histogram are adjusted for a uniform length of 30 days per month (Data source: National Center for Health Statistics and National Bureau of Economic Research)

Fig. 9.1—is well understood: temperature. Cold temperatures constrict the blood vessels and change the composition of the blood; furthermore, low temperatures facilitate the survival of bacteria in droplets and increase the risk for pulmonary infections (Eurowinter Group 1997, 2000; Huynen et al. 2001; Rau 2007).

The patterns observed in the three panels at the bottom of Fig. 9.1 deviate from the ones for circulatory and respiratory diseases above. Motor vehicle accidents do not peak in winter but around July and August. Many people assume that the reason for the peak in all-cause mortality is due to suicides in winter. The middle panel at the bottom of Fig. 9.1 illustrates why this assumption is wrong for three reasons: (1) The seasonal pattern is less pronounced for suicide than for other causes. (2) If one can speak of a seasonal pattern at all, the peak occurs definitely not during winter. (3) The 30,000 observed deaths are less than 1.5% of all deaths; not enough to shape the pattern for all causes. Lung cancer, whose impact on mortality in the United States was discussed in previous chapters, is—like many malignant neoplasms—an example of no or only negligible seasonality.

Figure 9.1 displays an aggregated picture of monthly deaths. In our analysis we want to investigate, however, whether the seasonal pattern for selected causes of death differs by age as well as whether the seasonal pattern changed over calendar time. The multiplicative model<sup>2</sup> suggested by Eilers et al. (2008) to decompose seasonal data allows such an analysis. The model is, at its core, another application of smoothing data via *P*-splines (Eilers and Marx 1996) as in Chap. 5. It is rather flexible since it allows the estimation not only of counts but also of rates. Exposures are then included as log offsets if the latter is desired, similar to Camarda's approach (2012, 2015) employed in Chaps. 5, 6, and 7. We use the model in its most simple form: The model is estimating counts assuming an annual unimodal pattern in the data. Not allowing for bimodal patterns or even higher frequencies should not induce any problems in our analysis since the causes in which we are interested in feature clear patterns with one peak and one trough (see Fig. 9.1).

We model the expected value of death counts *y* over age *a* and time *t*, *ta* D *E*.*yta*/, to be Poisson distributed using a log-link function

$$\log\left(\mu\_{ta}\right) = \upsilon\_{ta} + f\_{ta}\cos\left(\alpha t\right) + \mathfrak{g}\_{ta}\sin\left(\alpha t\right)$$

with ! D 2-=*p*, where *p* is the period. In our case of monthly values *p* D 12. Further technical details are given in Eilers et al. (2008).

The estimation yields three smooth matrices/surfaces, v*ta* for the trend as well as the smooth cosine and sine surfaces *fta* and *gta*. The trend surface captures any major changes in the overall pattern that could be caused by varying population sizes, survival improvements, competing risks . . . . We are mainly not interested in this trend surface nor in the the actual sine and cosine surfaces. The two latter surfaces allow us, however, to obtain an estimate for the amplitude and the phase over age and time via simple trigonometric functions. The latter denotes the location of the annual peak of the death counts and is expressed in the difference in days from the 1st of January; i.e., a value of 30 corresponds to late January whereas -30 indicates that mortality is highest in the beginning of December.

<sup>2</sup>Since the logarithm of death counts is modeled, it actually becomes an additive model.

#### **9.2 Results**

Our data and results are displayed in five panels for each selected cause. On the first page for each cause, we show the observed ("raw") monthly numbers of death by calendar time and single age (adjusted for a duration of 30 days) in the upper panel. The panel below plots the fit of the model, i.e., the combined pattern of the trend and the sine and the cosine surfaces, which is equivalent to the observed counts minus the (raw) residuals; see, for example, Fig. 9.2 on page 103 for mortality from all causes combined for women. Our main interest is displayed on the second page for each cause. The top panel shows the estimated trend surface v*at*. In the case of seasonality of all-cause mortality among US women (Fig. 9.3), we can see that the number of deaths from that category increases with age and reaches its "hotspot" for octogenarians before the numbers of death decline again. As the trend surface plots the seasonally-adjusted *density* of deaths, the lower number of deaths for nonagenarians are the consequence of less people being alive rather than a decline in the risk of dying. Even without the additional seasonal component, up to 3,500 women died at a single age during a single month. The height of "excess mortality" is depicted by the amplitude in the middle panel. Higher ages correspond not only to higher mortality; the colors and the contour lines suggest that mortality differences between winter and summer also become larger at higher ages. Increasing seasonality with age has already been described by Adolphe Quetelet in 1838 and is typically also found in more contemporary populations (Feinstein 2002; McDowall 1981; Rau and Doblhammer 2003; Rau 2007). Over time we can not really discern a clear trend. It seems rather that deaths for 70-year-old women in the US are about 10% higher during the peak season and about 15% higher for 90-yearold women than on average during a year. If we multiply the seasonal estimate of a given age and calendar time (e.g., 1.1) with the corresponding square of the trend surface (e.g., 1,500 deaths), we obtain the fitted value (e.g., 1,650 deaths) shown in the lower panel on the previous page. When the peak season occurs in a year is illustrated in the lower panel. The colors indicate a value slightly below 30. Hence, deaths occur most often in the end of January, regardless of age or calendar year.

The corresponding plots for men are depicted in Figs. 9.4 and 9.5. While male mortality is higher than female mortality at any age—at least in highly developed countries, the seasonal characteristics are rather similar between the two sexes: The proportion of excess deaths during winter varies between 5% at age 50 and 15% at age 90 with no apparent period effect. Also the part of the year when deaths peak among men occurs at the end of January. Those *seasonal* mortality similarities between women and men are not only present for all-cause mortality but also for most causes of death. That is why we restricted ourselves to show only the results for women but they apply equally to men. We show the results for men only in the case of motor vehicle accident because much less women die of that cause.

The largest subcategory analyzed by us in this chapter is death from heart diseases (see Figs. 9.6 and 9.7, pp. 107–108). Up to 1,300 deaths were recorded at a single age during a single month of a given year. As we can infer from the

**Raw Counts (adjusted)**

**Fig. 9.2** Seasonality of mortality from all causes in the United States, 1959–2014, women, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Fig. 9.4** Seasonality of mortality from all causes in the United States, 1959–2014, men, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

1960 1970 1980 1990 2000 2010

30

1960 1970 1980 1990 2000 2010

40

60

−100

**Fig. 9.6** Seasonality of mortality from heart diseases in the United States, 1959–2014, women, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Amplitude**

**Fig. 9.7** Seasonality of mortality from heart diseases in the united states, 1959–2014, women, estimated trend surface (*top panel*), amplitude (*middle panel*), and phase (*bottom panel*) (Data source: Human Mortality Database)

seasonal decomposition, this is the outcome of about 10 to 15% of excess deaths during the peak season. Also here we can not detect any period effects. In contrast to all-cause mortality with its peak at the end of January, deaths from heart diseases are highest at the end of February since the colors indicate a value of slightly below 60.

Most deaths from circulatory diseases can be attributed either to heart diseases or to cerebrovascular diseases. We analyzed the seasonal pattern of the latter category for women in Figs. 9.8 and 9.9 for men on pages 112–113. Comparable to heart diseases, the corridor with the largest number of deaths is moving to higher ages; the actual numbers are much smaller than for the other category, though. The extent of the seasonal pattern is remarkably similar to heart diseases. The amplitude is elevated again by about 10% around age 70 with larger fluctuations at higher ages and smaller fluctuations at younger ages. A clear trend over time is again not visible. Cerebrovascular diseases peak a bit earlier than heart diseases as suggested by the lower panels of Figs. 9.9 and 9.11. The highest number of deaths can be typically observed before the 30th day of the year, i.e., sometime between the middle and the end of January.

The Eurowinter group investigated the impact of cold temperatures on mortality about 20 years ago (e.g., Eurowinter Group 1997). They looked at ischaemic heart disease, cerebrovascular diseases, and respiratory diseases. As those three categories are mainly responsible for the seasonal pattern, we also analyzed the pattern for respiratory diseases, please see Figs. 9.12 and 9.13 on pages 114 & 115. The observed number of deaths is a bit higher than for cerebrovascular diseases. The seasonal decomposition on the second page shows that this is primarily the outcome of large seasonal fluctuations. Even the highest values in the trend surface on top are smaller than the corresponding values for cerebrovascular diseases. Excess deaths are, however, not only 10 to 15% higher in winter than throughout the year in general. The middle panel clearly illustrates that deaths from diseases such as pneumonia, influenza, COPD, etc. are at least 30% higher during peak season, which occurs at the end of February as the plot for the phase at the bottom illustrates. In contrast to the previously discussed two groups of circulatory diseases, the darker shades of blue during more recent years in the plot of the amplitude for respiratory diseases suggest that seasonal fluctuations became smaller over time.

Although motor vehicle accidents are by no means a major cause of death category, we decided nevertheless to include it. In the worst case 200 people of a given age died during a single month. The raw counts and fitted counts in Fig. 9.14 and the trend surface in Fig. 9.15 demonstrate that the period with the highest numbers of deaths is (thankfully) over. It occurred during the 1970s and 1980s to men aged around 20 years. The same plots show also that those men, born between 1950 and about 1965 suffer from a higher number of deaths also at higher ages. Since we are not looking at mortality per se but at death counts, this cohort effect is not necessarily the outcome of higher mortality; it could also be caused by the high number of births during those years ("baby boomers"). It is interesting to note, however, that we can also here detect a pattern on the 45<sup>ı</sup> line for the seasonal amplitude and for the phase, which should be unaffected by

**Fig. 9.8** Seasonality of mortality from cerebrovascular diseases in the United States, 1959–2014, women, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Raw Counts (adjusted)**

**Fig. 9.9** Seasonality of mortality from cerebrovascular diseases in the United States, 1959–2014, women, estimated trend surface (*top panel*), amplitude (*middle panel*), and phase (*bottom panel*) (Data source: Human Mortality Database)

**Amplitude**

1.04 1960 1970 1980 1990 2000 2010 **Phase in days** Age −100 −50 0 50 100 30 30 50 60 70 80 90 100

1960 1970 1980 1990 2000 2010

40

1.06 1.08 1.10 1.12 1.14 1.16

**Fig. 9.10** Seasonality of mortality from cerebrovascular diseases in the United States, 1959–2014, men, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Fig. 9.11** Seasonality of mortality from cerebrovascular diseases in the United States, 1959–2014, men, estimated trend surface (*top panel*), amplitude (*middle panel*), and phase (*bottom panel*) (Data source: Human Mortality Database)

1960 1970 1980 1990 2000 2010

1.1

Age

**Fig. 9.12** Seasonality of mortality from respiratory diseases in the United States, 1959–2014, women, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Fig. 9.13** Seasonality of mortality from respiratory diseases in the United States, 1959–2014, women, estimated trend surface (*top panel*), amplitude (*middle panel*), and phase (*bottom panel*) (Data source: Human Mortality Database)

−100

−50 0

50

100

100

200

300

**Raw Counts (adjusted)**

**Fig. 9.14** Seasonality of motor vehicle accidents in the United States, 1959–2014, men, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Fig. 9.16** Seasonality of mortality from all cancers in the United States, 1959–2014, women, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Fig. 9.17** Seasonality of mortality from all cancers in the United States, 1959–2014, women, estimated trend surface (*top panel*), amplitude (*middle panel*), and phase (*bottom panel*) (Data source: Human Mortality Database)

700

−100

−50 0

50

100

**Fig. 9.18** Seasonality of mortality from lung cancer in the United States, 1959–2014, women, raw counts (adjusted for length of month) and fitted model (Data source: Human Mortality Database)

**Fig. 9.19** Seasonality of mortality from lung cancer in the United States, 1959–2014, women, estimated trend surface (*top panel*), amplitude (*middle panel*), and phase (*bottom panel*) (Data source: Human Mortality Database)

1960 1970 1980 1990 2000 2010

a larger population at risk since the trend surface accounts for it. The panel in the middle of Fig. 9.15 shows lowest seasonality for the birth cohort born before the baby boomers mentioned above. Also the change of the period when most deaths from motor vehicle accidents occur throughout a year features a cohort pattern. Whereas deaths from car accidents and similar causes peaked late in fall for older cohorts, the highest number of deaths for baby boomers and later generations are recorded at least 120 days before the 1st of January, which corresponds to August of a year.

We want to conclude this chapter by showing that cancers in general (see Figs. 9.16 and 9.17 on pages 118–119) and lung cancer (see Figs. 9.18 and 9.19 on pages 120–121) are examples of non-seasonal diseases. Clearly the fluctuations throughout a year are barely noticable as the middle panels of Figs. 9.17 and 9.19 illustrate.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 10 Surface Plots for Cancer Survival**

#### **10.1 Introduction and Overview: The Impact of Cancer on Mortality in the United States**

With 23.4% or 614,348 out of 2,626,418 deaths in the United States, heart diseases remained the leading cause of the death in the United States in 2014 (CDC/NCHS 2015). Hence, heart diseases contributed most to the age-standardized crude death rate in that year. The absolute level of mortality from heart diseases and other circulatory diseases diminished remarkably during recent decades as we show in Fig. 10.1. To avoid spurious results from the changing age composition of the population, we used the population of the year 2000 to age-standardize the rates. During the observed 60 years, mortality—as measured by the age-standardized crude death rate—dropped steadily for women as well as for men. This trend of declining mortality from circulatory diseases and rather stagnant cancer mortality may result in a reversal of the leading group of causes of death in the near future when more people might die of malignant neoplasms than of heart diseases or stroke.

The converging trajectories of these two major causes of death can be also presented from the perspective of cause-elimination life tables (results not shown here; see, for instance Preston et al. (2001) or Kintner (2004) for the methodology): If circulatory diseases had been non-existent, life expectancy at birth would have been 11 years higher in the 1960s. This gap decreased to about 4 years during the most recent years (3.62 for women, 4.16 for men), whereas the impact of eradicating cancer remained relatively stationary over time for malignant neoplasms.

The proportion of deaths from cancer in relation to all causes varies considerably by age as well as over time as we show in Fig. 10.2. The marginal distribution over age is bimodal. A local peak is reached at childhood ages with the main contributing cancers being leukemia and lymphoma as pointed out in Moore and Hurvitz (2009).

**Fig. 10.1** Age-standardized crude death rates by cause and sex (*left panel*: women; *right panel*: men) in the United States from 1959–2014 (Data source: Own estimation based on data from the Human Mortality Database and the National Center for Health Statistics. The population of the year 2000 was used as the standardization population)

The age when the global peak is reached depends on the sex. 40% or more of all deaths of women around age 50 can be attributed to cancers whereas the largest proportion among men is reached between ages 60 and 70.

#### **10.2 Dynamics of Cancer Survival by Cancer Site**

People are usually not healthy and then die suddenly of a chronic, noncommunicable disease such as cancer. In a very simplified manner, we can regard this as a two-step process: (1) People are healthy and then are diagnosed with a certain chronic disease *x*. (2) People who are diagnosed with disease *x* die of *x* or of another disease. The SEER data allow us to investigate developments for both steps. We can look at incidence data for the first step and see how incidence has changed over time by age. This might allow us to make inferences about the successes and failures of cancer *prevention*. We focus, however, on the second step: Analyzing survival from the moment of diagnosis to death. Thus, our focus is rather on the successes and failures of cancer *treatment*.

**Fig. 10.2** Proportion of deaths from cancer in relation to all causes (*left panel*: women; *right panel*: men) in the United States at ages 0–100 from 1959–2010 (Data source: National Center for Health Statistics)

We decided to base our analysis on the five year survival rate. According to the National Cancer Institute (2017) it is the "percentage of people in a study or treatment group who are alive 5 years after they were diagnosed with or started treatment for a disease, such as cancer."1 We use three different operationalizations of five-year survival:


<sup>1</sup>Since it is a percentage/proportion, we wonder why the term "rate" has become so commonly used.

probability that someone will not die of the diagnosed cancer within 5 years. This second operationalization is sometimes called "corrected survival rate", "net survival" or "disease-specific survival" (Parkin and Hakulinen 1991, p. 167). We use the last term.

3. While the first two approaches describe the risk of dying of any cause (1) or of the diagnosed cancer (2), the third approach compares the survival chances of the diagnosed individuals with the general population. The ratio of observed survival to expected survival is called "relative survival" and can be traced back to Berkson and Gage (1952). Relative survival is "defined as the observed survival of the cancer patients divided by the expected survival of a comparable group from the general population, free from the cancer under study" (Talbäck and Dickman 2011, p. 2626). The observed survival rate for relative survival corresponds to our first approach, i.e., the probability of surviving from all causes of death. The most common methods to estimate relative survival (e.g., Ederer I, Ederer II, Hakulinen) differ with regard to the estimation of expected survival, though (Cho et al. 2011). As shown by Rutherford et al. (2012, p. 20), "[t]aking age into account [. . . ] removes most of the differences between the methods." Since we analyze by single ages and single calendar years, the choice of method to estimate expected survival is less of a problem. We estimated expected survival with life table data from the Human Mortality Database (2017): Expected five year survival for 55 year old women in the year 2000 was the probability to survive age 55 in the year 2000 multiplied by the probability to survive age 56 in the year 2001, . . . multiplied by the probability to survive age 59 in the year 2004. Using the general population instead of the general population free from cancer violates the definition of relative survival. It has been done and justified, however, since the inception of the method (please see Appendix Note 2 of Berkson and Gage (1952) or Ederer et al. (1961)). Also recent papers such as Talbäck and Dickman (2011, p. 2626 and Table 2) argue "that the bias is sufficiently small to be ignorable for most applications." Not accounting for the inclusion of cancer patient mortality becomes a problem only for the oldest subjects and follow-up times of 10 years or more. We would also argue that our estimates for five-year survival are sufficiently close to the official estimates. For example, SEER estimates relative survival of women diagnosed with breast cancer aged 50–64 years to be 90.1% during the period 2007–2013.2 Our results for the most recent 3 years of our analysis varied between 90.05% and 91.08%.

The three approaches are featured in a panel each of Fig. 10.3 for breast cancer. We restricted our analysis of breast cancer to women although men can die from it as well. Our estimates for single year and age for breast cancer as well as for all other cancer sites have been smoothed, again using *P*-Splines as outlined in Chap. 5 Eilers and Marx (1996); Camarda (2012, 2015).

<sup>2</sup>See https://seer.cancer.gov/explorer/application.php?site=55&data\_type=4&stat\_type= 5&compareBy=sex&series=race&chk\_sex\_3=3&chk\_race\_1=1&chk\_age\_range\_141= 141&chk\_age\_range\_160=160&chk\_stage\_101=101&advopt\_precision=1&showDataFor= age\_range\_160\_and\_stage\_101.

**Fig. 10.3** Five year survival for breast cancer at ages 30–95 from 1973–2005. *Left panel*: Probability to survive for 5 years after diagnosed with breast cancer (any cause). *Middle panel*: Probability of not dying from breast cancer within 5 years after diagnosis. *Right panel*: Five year survival of women diagnosed with breast cancer in relation to five year survival of women in the general population (Data Source: SEER and Human Mortality Database)

The panel on the left denotes the probability to survive for another 5 years after being diagnosed with breast cancer, regardless of the actual cause of death. The figure exhibits an obvious age gradient: Values of 30% or less at ages above 90 are the consequence that the women are not only at an elevated risk of dying from breast cancer. Other causes, most notably circulatory diseases, further reduce the chances to survive for five more years. Consequently, the upward trend of the contour lines can not be interpreted as progress made against the lethality of breast cancer. Still it provides the answer to the question "How likely is it that I survive for another five years?" for someone who got diagnosed with breast cancer.

While the left panel takes all "exit" possibilities into account, the panel in the middle looks only at death from breast cancer. As a consequence, one minus the depicted value equals the probability to die from breast cancer within 5 years after diagnosis. The rather vertical lines from about age 40 to about age 80 indicate that the chance of surviving breast cancer for at least 5 years has increased over time. For instance, the probability for 60-year-old women who got diagnosed with breast cancer in 1980 to survive 5 years was 80%; the equivalent value in 2000 was higher than 90%. To express it even more positively: The risk of dying was cut in half within less than 20 years (1980: 1 0:8 D 20%; 1995 W 1 0:9 D 10%)!

The panel on the right of Fig. 10.3 shows "relative survival", i.e., it illustrates the relative survival disadvantage of those diagnosed with breast cancer in relation to the general population. A level of one would indicate that there was no difference in the chance to survive for five more years between someone with a cancer diagnosis and the general population. Unfortunately—but also not surprisingly—women with breast cancer have lower survival chances than the general population. We can detect, however, progress over time. The excess risk is less than 10% in recent years for women with breast cancer in comparison to the general population (contour line of 0.9) whereas it was about 30% just 25 years earlier. It is important to point out that the increasing values of the vertical lines suggest a clear period effect: Progress against breast cancer was faster than progress in survival in general, regardless of the age when the woman was diagnosed.

It is theoretically possible to observe relative survival estimates that are higher than one. For instance, it could be the outcome of a selection effect: Persons that take advantage of screening programs and other early preventive measures are possibly leading rather healthy lifestyles. If those persons are diagnosed with a cancer that is virtually non-lethal, their survival advantage of their health behavior might be stronger than the additional mortality risk of the malignant neoplasm. Hence, it can not be concluded that getting diagnosed with a certain cancer could actually improve survival chances. We would argue, though, that the small area at ages 90– 95 in 2000 is not the outcome of such a selection effect. Instead, we assume that it is the outcome of random data fluctuations due to small numbers of persons getting diagnosed. For example, 46 women at age 93 were diagnosed with breast cancer in 2000.

The corresponding estimates for colorectal cancer are depicted in Figs. 10.4 and 10.5 for women and men, respectively (pages 129 & 130). Both sexes feature comparable estimates. The dynamics are somehow reminiscent of breast cancer

**Fig. 10.4** Five year survival for colorectal cancer at ages 30–95 from 1973–2005. *Left panel*: Probability to survive for 5 years after diagnosed with colorectal cancer (any cause). *Middle panel*: Probability of not dying from colorectal cancer within 5 years after diagnosis. *Right panel*: Five year survival of women diagnosed with colorectal cancer in relation to five year survival of women in the general population (Data Source: SEER and Human Mortality Database)

**Fig. 10.5** Five year survival for colorectal cancer at ages 30–95 from 1973–2005. *Left panel*: Probability to survive for 5 years after diagnosed with colorectal cancer (any cause). *Middle panel*: Probability of not dying from colorectal cancer within 5 years after diagnosis. *Right panel*: Five year survival of men diagnosed with colorectal cancer in relation to five year survival of men in the general population (Data Source: SEER and Human Mortality Database)

albeit on a lower survival level: The chances to survive for another 5 years (left panels) above age 80 tend to follow a horizontal trend over time. This could be caused by at least two factors: Either there was no progress over time or that competing causes at those advanced ages are more important. There was, indeed, progress over time as shown by the panels in the middle of both figures. But despite all this progress, relative survival is still at least 30% lower than in the general population (right panels).

The dominance of shades of green in Figs. 10.6 and 10.7 illustrate that survival chances are much worse for lung cancer than for breast or colorectal cancer. The chances to survive for another 5 years after being diagnosed with cancer are less than 30%. Even at very advanced ages, relative survival is very low. On average it is about 80% lower in comparison to the general population.

Pancreatic cancer, as shown in Fig. 10.8 for women and men, belongs to the cancer sites with the worst survival chances. Living for another 5 years after diagnosis is extremely unlikely with a proportion of survivors of less than 10%. It is therefore not surprising that relative survival is also very low.

The last cancer site we investigated was prostate cancer (see Fig. 10.9). In terms of survival it can be found at the other side of the spectrum of pancreatic cancer. The vertical, numerically increasing, contour lines in the panel for relative survival provide evidence for a clear period effect: Relative survival became more common at all ages at a pace that was faster than improvements in survival in the general population. The most recent estimates show values of relative survival of more than 95%.

Differences in survival do not only exist between cancer sites. An important factor is also the stage when the cancer is diagnosed first. The data used in this study provide stage information for<sup>3</sup>


We only present an example for colorectal cancer, contrasting the survival chances of persons where a localized tumor was detected with those with a distant malignant neoplasm. Figure 10.10 present the results for women; the corresponding plots for males are contained in Fig. 10.11. Both six-panel plots provide clear evidence that early detection of colorectal cancer is, literally, a matter of life

<sup>3</sup>Further details can be found in the field description of variable "SEER Historic Stage A" in the SEER research data record description.

**Fig. 10.6** Five year survival for lung cancer at ages 36–95 from 1973–2005. *Left panel*: Probability to survive for 5 years after diagnosed with lung cancer (any cause). *Middle panel*: Probability of not dying from lung cancer within 5 years after diagnosis. *Right panel*: Five year survival of women diagnosed with lung cancer in relation to five year survival of women in the general population (Data Source: SEER and Human Mortality Database)

**Fig. 10.7** Five year survival for lung cancer at ages 36–95 from 1973–2005. *Left panel*: Probability to survive for 5 years after diagnosed with lung cancer (any cause). *Middle panel*: Probability of not dying from lung cancer within 5 years after diagnosis. *Right panel*: Five year survival of men diagnosed with lung cancer in relation to five year survival of men in the general population (Data Source: SEER and Human Mortality Database)

**Fig. 10.8** Five year survival for pancreatic cancer at ages 50–90 from 1973–2005. *Left column*: women; *right panel*: men. *Upper panels*: Probability to survive for 5 years after diagnosed with pancreatic cancer (any cause). *Middle panels:* Probability of not dying from pancreatic cancer within 5 years after diagnosis. *Lower panels*: Five year survival of women or men diagnosed with pancreatic cancer in relation to five year survival of women or men in the general population (Data Source: SEER and Human Mortality Database)

**Fig. 10.9** Five year survival for prostate cancer at ages 52–90 from 1973–2005. *Upper left panel*: Probability to survive for 5 years after diagnosed with prostate cancer (any cause). *Upper right panel*: Probability of not dying from prostate cancer within 5 years after diagnosis. *Lower left panel*: Five year survival of men diagnosed with prostate cancer in relation to five year survival of men in the general population (Data Source: SEER and Human Mortality Database)

**Fig. 10.10** Five year survival for colorectal cancer at ages 60–95 from 1973–2005 by stage. *Upper row*: Stage 1, localized cancer. *Lower row*: Stage 4, distant cancer. *Left panels*: Probability to survive for 5 years after diagnosed with colorectal cancer (any cause). *Middle panels*: Probability of not dying from colorectal cancer within 5 years after diagnosis. *Right panels*: Five year survival of women diagnosed with colorectal cancer in relation to five year survival of women in the general population (Data Source: SEER and Human Mortality Database)

**Fig. 10.11** Five year survival for colorectal cancer at ages 60–89 from 1973–2005 by stage. *Upper row*: Stage 1, localized cancer. *Lower row*: Stage 4, distant cancer. *Left panels*: Probability to survive for 5 years after diagnosed with colorectal cancer (any cause). *Middle panels*: Probability of not dying from colorectal cancer within 5 years after diagnosis. *Right panels*: Five year survival of men diagnosed with colorectal cancer in relation to five year survival of men in the general population (Data Source: SEER and Human Mortality Database)

and death: Relative survival is about ten to 20% lower than in the general population when being diagnosed at an early stage (upper three panels in each figure). This excess mortality is pale beside cancer that has already metastasized when being diagnosed (lower three panels in each figure): Only 10% as many people survive the next 10 years as in the general population.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Chapter 11 Summary and Outlook**

The goal of this monograph was to show a variety of possibilities to visualize mortality dynamics on the Lexis plane. While we provided examples of raw and smoothed mortality surfaces, our focus was on visualizing rates of mortality improvement ("ROMIs"), i.e., the derivative of age-specific death rates with respect to time. We provided ROMI examples for national populations covered by the Human Mortality Database as well as for selected causes of death in the United States. These "ROMI-plots" were quite instructive to detect period and cohort effects. We also illustrated how changes in age-specific mortality contribute to a gain (or loss) in life expectancy. In Chap. 9 we decomposed seasonal data for causes of death to investigate whether the seasonal pattern, measured via the amplitude and the peak moment ("phase"), has changed over or age. The previous chapter dealt with survival chances of persons who were diagnosed with cancer.

Despite the large number of figures, our list is obviously not exhaustive; here we want to provide a few more two ideas how the Lexis diagram can be used to illustrate not only mortality dynamics.

Figure 11.1 adapts our ROMI approach to fertility. The top panel contains a surface map of age-specific fertility rates in the eastern part of Germany. Birth counts and corresponding exposures by single year of age and calendar time were downloaded from the Human Fertility Database (2017). The estimates were (again) generated with Camarda's R package for smoothing surfaces with *P*-splines

**Fig. 11.1** Age-specific fertility in eastern Germany from 1956 until 2013. *Top panel*: Smoothed surface map. *Bottom panel*: Surface map of rates of fertility improvement (Data source: Human Fertility Database)

(Camarda 2012, 2015). The lower panel shows rates of fertility improvement, where improvement means an increase in fertility. Thus, it is the opposite definition of mortality where a decline in mortality was interpreted as an improvement. It is already apparent in the upper panel that fertility dropped considerably a few years after reunification. In 1993 and 1994, the so-called total fertility rate dropped to 0.78 children per woman. This development corresponds to the dark, almost black, vertical area around 1990.1

The subsequent recovery of the total fertility rate can be traced back to a strong cohort effect as illustrated by the red, orange, and yellow triangular area starting in about 1995 at ages 25 and above. It is equally interesting that age-specific fertility of younger women (aged about 20–24) has not gone back to pre-reunification levels but continues to decrease. The figure also shows that a similar development of a sudden decline and recovery was experienced already in during 1960s and 1970s.

The last example we want to provide is for the third main parameter in demography: Migration. In Fig. 11.2 we depicted the smoothed age-profile of immigrations of men in Sweden. We estimated for each year from 1968 to 2016 the relative frequencies of each single age. The corresponding plot based on unsmoothed data is included in the appendix in Fig. A.14. We selected this plot because it shows that the age schedule of immigration movements is rather time-invariant—despite the increase of immigrants in recent years coming to Sweden. Male migrants during the past 50 years were typically 20 to 30 years old when they arrived in Sweden. There was virtually not a single year, when more than 1% at a single age of men, i.e., approximately the expected value of a uniform distribution over age, coming to Sweden were older than 45 years.

Using the Lexis diagram is not restricted to depict dynamics of populations. In principle any phenomenon that can be classified by age and calendar time could be illustrated. One example could be unemployment. We would argue that a plot created analogously to the ROMI-plots could easily reveal how labor market reforms may affect various age-groups differently.

We also would like to point out that a plot in the Lexis diagram is not the answer to any question related to population dynamics; for example, maps might be more suitable for spatial analyses or circular plots for migration flows as popularized among demographers by Abel and Sander (2014).

<sup>1</sup>The reader might be surprised that the dark gray areas start already in the late 1980s and may attribute it to the impact of smoothing. Please note that fertility started to decline at several ages already before re-unification in 1990. Thus, the gray areas that show up in the late 1980s can not be traced back completely to the impact of smoothing. Please see Fig. A.13 in the appendix for the corresponding surface maps based on *unsmoothed* age-specific fertility rates.

**Fig. 11.2** (Smoothed) Age-profile (relative frequencies) of immigrations of men to Sweden, 1968–2016 (Data source: Statistics Sweden)

#### 11 Summary and Outlook 143

In the introductory chapter we wrote that the main reasons to visualize data can be summarized to be exploration, confirmation, and presentation. We assume that more exploratory analyses will be conducted in coming years using dynamic graphics as their generation is nowadays greatly facilitated by platforms such as node.js, for instance. Nevertheless, we remain confident that plots as the ones contained in this monograph will continue to serve as important tools in all three areas.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Additional Figures**

**France, Women**

**Fig. A.1** "Raw" death rates for women in France, 1950–2014 (Data source: Human Mortality Database)

**France, Men**

**Fig. A.2** "Raw" death rates for men in France, 1950–2014 (Data source: Human Mortality Database)

**England & Wales, Women**

**Fig. A.3** "Raw" death rates for women in England & Wales, 1950–2014 (Data source: Human Mortality Database)

**England & Wales, Men**

**Fig. A.4** "Raw" death rates for men in England & Wales, 1950–2013 (Data source: Human Mortality Database)

**Norway, Women**

**Fig. A.5** "Raw" death rates for women in Norway, 1950–2013 (Data source: Human Mortality Database)

**Norway, Men**

**Fig. A.6** "Raw" death rates for men in Norway, 1950–2014 (Data source: Human Mortality Database)

**France, Women**

**Fig. A.7** Smoothed death rates for women in France, 1950–2014 (Data source: Human Mortality Database)

**France, Men**

**Fig. A.8** Smoothed death rates for men in France, 1950–2014 (Data source: Human Mortality Database)

**England & Wales, Women**

**Fig. A.9** Smoothed death rates for women in England & Wales, 1950–2014 (Data source: Human Mortality Database)

**England & Wales, Men**

**Fig. A.10** Smoothed death rates for men in England & Wales, 1950–2013 (Data source: Human Mortality Database)

**Norway, Women**

**Fig. A.11** Smoothed death rates for women in Norway, 1950–2013 (Data source: Human Mortality Database)

**Norway, Men**

**Fig. A.12** Smoothed death rates for men in Norway, 1950–2014 (Data source: Human Mortality Database)

**Fig. A.13** Age-specific fertility in eastern Germany from 1956 until 2013. *Top panel*: Unsmoothed surface map. *Bottom panel*: Surface map of rates of fertility improvement based on unsmoothed age-specific fertility rates (Data source: Human Fertility Database)

**Fig. A.14** (Unsmoothed) Age-profile (relative frequencies) of immigrations of men to Sweden, 1968–2016 (Data source: Statistics Sweden)

## **Software: R package ROMIplot**

#### **A.1 Background, Installation and Requirements**

All figures in this monograph have been created using R (R Development Core Team 2015), a free software environment for statistical computing and graphics. The first author of this monograph has written an extension package for R to facilitate the creation of plots of rates of mortality improvement for others (Rau and Riffe 2015). The current version of the package includes code written by Tim Riffe to read data from the Human Mortality Database.

The package is called ROMIplot and can be downloaded from any CRAN mirror, the central repository of all R packages, in the canonical way:

install.packages("ROMIplot")

It needs to be downloaded only once but has to be activated whenever it is needed in an R session via:

library(ROMIplot)

Apart from the base system and the packages utils, graphics, and grDevices—which are all included in any standard distribution of R—package ROMIplot has two dependencies, i.e., it requires two additional packages to function properly:


## **A.2 Functions**

### *A.2.1 readHMDformat()*

The function readHMDformat() requires four input parameters.


The function returns a list consisting of two data frames.


#### *A.2.2 create.Lexis.matrix()*

The function create.Lexis.matrix() requires six input parameters.



**Table A.1** Abbreviations ("CNTRY") and their corresponding country names in the Human Mortality Database


The function returns a matrix that contains the combined number of deaths or exposures for a given combination of calendar year and age. Row names denote ages from minage to maxage; column names denote calendar years from minyear to maxyear.

#### *A.2.3 ROMI.plot()*

The function create.Lexis.matrix() requires up to four input parameters.

• Dx: A matrix of death counts, expected to be in the format as prepared by function create.Lexis.matrix.


Based on the matrix of (smoothed) death rates, function ROMI.plot() estimates a matrix of rates of mortality improvement, *x*;*t*, applying and re-arranging the standard equation for continuous population growth. Since we estimate the rates annually, *t* D 1:

$$m\_{\mathbf{x},t+1} = m\_{\mathbf{x},t} \ e^{-\rho\_{\mathbf{x},t}t}; \quad \rho\_{\mathbf{x},t} = -\log\_{\mathbf{c}}\left(\frac{m\_{\mathbf{x},t+1}}{m\_{\mathbf{x},t}}\right).$$

In most applications the returned matrix is not the main interest of the researcher but the plot that is produced as a side effect. Please note that we used the same color scheme as in the present volume. But this is only a suggestion. Our package is free software. Thus, anyone should feel invited to modify and possibly also improve our package as only the most fundamental elements have been included in the current version.

## **References**


R. Rau et al., *Visualizing Mortality Dynamics in the Lexis Diagram*, The Springer Series on Demographic Methods and Population Analysis 44, DOI 10.1007/978-3-319-64820-0

